library(fs)
library(tensorflow)
library(keras3)
Attaching package: 'keras3'
The following objects are masked from 'package:tensorflow':
set_random_seed, shape
This notebook will walk through setting up a model for the semantic segmentation of images using convolutional neural networks (CNNs). Unlike classification, where we attempt to predict a label associated with an image (e.g. cat or dog), in semantic segmentation, we are trying to label each pixel within an image. This is usually done by providing a corresponding mask for each training image that indicates which pixels belong to which class. The example used here is based on a set of aerial images taken across Dubai and used in a Kaggle competition:
https://www.kaggle.com/datasets/humansintheloop/semantic-segmentation-of-aerial-imagery
There are a total of 72 images and masks in this dataset. In the interest of making this tractable in a class, we’ll just train the model using a subset (18) of these images, and only for a few epochs. With a relatively small dataset, the goal of this lab is demonstrate how to build and evaluate these models. I would not expect to get a very high level of accuracy without increasing both the size of the data and the number of epochs.
Code for the UNet model in this example has been modified from https://github.com/r-tensorflow/unet/tree/master
First, let’s load some libraries
library(fs)
library(tensorflow)
library(keras3)
Attaching package: 'keras3'
The following objects are masked from 'package:tensorflow':
set_random_seed, shape
Next, we’ll get the images. These are available through the class Google drive in the zip file unet_images3.zip. Download this now, and move it to a folder that is easy to find on your computer, and unzip it. This will create a set of folders that look like this:
- images3
- images
- masks
In each of these you’ll find matching images. The images folder contains the RGB image as JPEGs, and the masks folder contains the matching mask as PNG files. The file names should match, so image_part_001_000.png will be the mask for image_part_001_000.jpg. These files are smaller tiles created from the original images. If you want to see what the original images look like, download and unzip the file unet_images2.zip. If you have this, you can load an example of each. First, we’ll make a couple of functions to display images using keras functions:
display_image_tensor <- function(x, ..., max = 255,
plot_margins = c(0, 0, 0, 0)) {
if(!is.null(plot_margins))
par(mar = plot_margins)
x |>
as.array() |>
drop() |>
as.raster(max = max) |>
plot(..., interpolate = FALSE)
}
display_target_tensor <- function(target) {
display_image_tensor(target, max = 5)
}Now get the list of full images:
data_dir <- path("./datafiles/images2/")
input_dir <- data_dir / "images/"
target_dir <- data_dir / "masks/"
image_paths <- tibble::tibble(
input = sort(dir_ls(input_dir, glob = "*.jpg")),
target = sort(dir_ls(target_dir, glob = "*.png")))And here’s the first image:
image_paths$input[1] |>
tf$io$read_file() |>
tf$io$decode_jpeg() |>
display_image_tensor()And the corresponding mask:
image_paths$target[1] |>
tf$io$read_file() |>
tf$io$decode_png() |>
display_image_tensor()Now let’s take a look at the tiles in images3/. We’ll make a list of the full paths to both images and masks for use in training the model
data_dir <- path("./datafiles/images3/")
dir_create(data_dir)
input_dir <- data_dir / "images/"
target_dir <- data_dir / "masks/"
image_paths <- tibble::tibble(
input = sort(dir_ls(input_dir, glob = "*.jpg")),
target = sort(dir_ls(target_dir, glob = "*.png")))
image_paths# A tibble: 2,016 × 2
input target
<fs::path> <fs::path>
1 ./datafiles/images3/images/image_part_001_000.jpg …sks/image_part_001_000.png
2 ./datafiles/images3/images/image_part_001_001.jpg …sks/image_part_001_001.png
3 ./datafiles/images3/images/image_part_001_002.jpg …sks/image_part_001_002.png
4 ./datafiles/images3/images/image_part_001_003.jpg …sks/image_part_001_003.png
5 ./datafiles/images3/images/image_part_001_004.jpg …sks/image_part_001_004.png
6 ./datafiles/images3/images/image_part_001_005.jpg …sks/image_part_001_005.png
7 ./datafiles/images3/images/image_part_001_006.jpg …sks/image_part_001_006.png
8 ./datafiles/images3/images/image_part_001_007.jpg …sks/image_part_001_007.png
9 ./datafiles/images3/images/image_part_001_008.jpg …sks/image_part_001_008.png
10 ./datafiles/images3/images/image_part_001_009.jpg …sks/image_part_001_009.png
# ℹ 2,006 more rows
If we plot the first image, you should see that it is the top-left corner of the original image
image_paths$input[1] |>
tf$io$read_file() |>
tf$io$decode_jpeg() |>
display_image_tensor()We’ll load the matching mask as well. Note that this has been converted to an integer mask, with 6 possible classes:
Building = 0
Land = 1
Road = 2
Vegetation = 3
Water = 4
Unlabeled = 5
image_paths$target[1] |>
tf$io$read_file() |>
tf$io$decode_png() |>
display_target_tensor()Next, we’ll create two tensorflow datasets that hold the images. As this is a fairly small dataset, we’ll simply read the images into memory. For larger sets, we would need to create a data generator here. We’ll first make a couple of helper functions:
library(tfdatasets)
Attaching package: 'tfdatasets'
The following object is masked from 'package:keras3':
shape
tf_read_image <-
function(path, format = "image", resize = NULL, ...) {
img <- path |>
tf$io$read_file() |>
tf$io[[paste0("decode_", format)]](...)
if (!is.null(resize))
img <- img |>
tf$image$resize(as.integer(resize))
img
}
tf_read_image_and_resize <- function(..., resize = img_size) {
tf_read_image(..., resize = resize)
}
make_dataset <- function(paths_df) {
tensor_slices_dataset(paths_df) |>
dataset_map(function(path) {
image <- path$input |>
tf_read_image_and_resize("jpeg", channels = 3L) ## Reads images (3 channels)
target <- path$target |>
tf_read_image_and_resize("png", channels = 1L) ## Reads masks (1 channel)
# target <- target - 1
list(image, target) ## Stores image and corresponding mask
}) |>
dataset_cache() |> ## Dynamically caches the images
dataset_shuffle(buffer_size = nrow(paths_df)) |> ## Shuffles images between runs
dataset_batch(32)
}Now let’s create the dataset. First, we’ll define the input image size - for this we’ll keep the images at their original size (128x128) but this can be used if the tiles are of different sizes to ensure all input tensors are the same. Second, we define the number of images to be used for validation (roughly 25% of the inputs). Third, we split the list of file names into training and validation. And finally, we make the two datasets
img_size <- c(128, 128)
num_val_samples <- 500
val_idx <- sample.int(nrow(image_paths), num_val_samples)
val_paths <- image_paths[val_idx, ]
train_paths <- image_paths[-val_idx, ]
validation_dataset <- make_dataset(val_paths)
train_dataset <- make_dataset(train_paths)We’ll finish this section by defining a set of variables describing the images: the width and height, the number of channels and classes
image_width = img_size[1]
image_height = img_size[2]
num_channels = 3
num_classes = 6Now let’s turn to building the model. We’ll use a basic UNet architecture for this. This has two sequential branches (encoder and decoder) as well as a number of skip connections. The encoder branch operates like a classic CNN, with convolution and pooling layers. The decoder reverses this, by upsampling to increase resolution and more convolutions. Practically, each branch has a series of steps which either decrease resolution (encoder) or increase it (decoder). The steps on each side match: so for example, the encoder could have step going from a resolution of 64 to 32, and the decoder has a matching set going from 32 to 64.
We’ll need to use some new layer types for this, so we’ll take a look at these first
Upsampling layers acts as the opposite to a max-pooling layer. Pooling reduces the size of the inputs, by only replacing a window of pixels (usually 2 by 2) with a single pixel containing the maximum value of the original 4. An upsampling layer will increase the resolution of the input according to a defined window (usually 2x2, meaning each original pixel is split into 4). There are two types of upsampling layers
UpSampling2DThis simply increases the resolution of the input. So an input pixel with the value of 2 will be split into 4, each with the value of 2:
In: [2]
Out: [[2, 2],
[2, 2]]
Conv2DTransposeIn addition to the upsampling, this layer applies convlutional filters. As a result, the value of the 4 output pixels are based on feature recognition in the coarser image, rather than simply using the same value
Skip connections are used to join the encoder and decoder branch. These join the matching encoder and decoder steps (e.g. the downsampling from 64 to 32 and the upsampling from 32 to 64). This is done through the use of concatenate layers. These link together output from different layers - for example, if you wanted to introduce two different sets of input features through different networks, a concatenate layer then merges these together before linking to the output.
To understand how this works for the UNet model, let’s say our input images are 128x128 pixels:
In practice this is more complex as these skip connections are taking place at every down/up-sampling step.
Let’s actually build the model now so that you can see what this looks like. We’ll use the functional API which will allow us to build this in sections. One thing to note here is that (after the input), we store the layers in an object called x, then add the next layer to this so that it accumulates these:
## To store the blocks for the downward pass
down_layers <- list()## Input
input <- layer_input(shape = c(image_width, image_height, num_channels))
x <- layer_rescaling(input, 1/255)First downsampling block. This is the first of four downsampling blocks that make up the encoder. These will have the same format, but the number of convolutional filters will increase by 2 at each block:
A first convolutional layer
A dropout layer
A second convolutional layer
(Store the block)
A max-pooling layer
## ------------
# Encoder path: forward step 1
x <- layer_conv_2d(x, filters = 16, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")
x <- layer_dropout(x, rate = 0.1)
x <- layer_conv_2d(x, filters = 16, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")
## Store block
down_layers[[1]] <- x
## Max-pooling
x <- layer_max_pooling_2d(x, pool_size = c(2,2), strides = c(2,2))## ------------
# Encoder path: forward step 2
x <- layer_conv_2d(x, filters = 32, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")
x <- layer_dropout(x, rate = 0.1)
x <- layer_conv_2d(x, filters = 32, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")
## Store block
down_layers[[2]] <- x
## Max-pooling
x <- layer_max_pooling_2d(x, pool_size = c(2,2), strides = c(2,2))## ------------
# Encoder path: forward step 3
x <- layer_conv_2d(x, filters = 64, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")
x <- layer_dropout(x, rate = 0.1)
x <- layer_conv_2d(x, filters = 64, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")
## Store block
down_layers[[3]] <- x
## Max-pooling
x <- layer_max_pooling_2d(x, pool_size = c(2,2), strides = c(2,2))## ------------
# Encoder path: forward step 4
x <- layer_conv_2d(x, filters = 128, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")
x <- layer_dropout(x, rate = 0.1)
x <- layer_conv_2d(x, filters = 128, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")
## Store block
down_layers[[4]] <- x
## Max-pooling
x <- layer_max_pooling_2d(x, pool_size = c(2,2), strides = c(2,2))## ------------
# Latent space
## Add another dropout
x <- layer_dropout(x, rate = 0.1)
## Convolutional layer on latent space
x <- layer_conv_2d(x, filters = 256, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")
x <- layer_conv_2d(x, filters = 256, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")Conv2DTranspose layer to upsample the inputs, increasing the resolutionconcatenate layer that links this to the corresponding downsampling block (this will be the fourth one)## ------------
# Decoder path 4
x <- layer_conv_2d_transpose(x, filters = 128, kernel_size = c(2,2),
padding = "same", strides = c(2,2))
x <- layer_concatenate(list(down_layers[[4]], x))
x <- layer_conv_2d(x, filters = 128, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")
x <- layer_dropout(x, rate = 0.1)
x <- layer_conv_2d(x, filters = 128, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")## ------------
# Decoder path 3
x <- layer_conv_2d_transpose(x, filters = 64, kernel_size = c(2,2),
padding = "same", strides = c(2,2))
x <- layer_concatenate(list(down_layers[[3]], x))
x <- layer_conv_2d(x, filters = 64, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")
x <- layer_dropout(x, rate = 0.1)
x <- layer_conv_2d(x, filters = 64, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")## ------------
# Decoder path 2
x <- layer_conv_2d_transpose(x, filters = 32, kernel_size = c(2,2),
padding = "same", strides = c(2,2))
x <- layer_concatenate(list(down_layers[[2]], x))
x <- layer_conv_2d(x, filters = 32, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")
x <- layer_dropout(x, rate = 0.1)
x <- layer_conv_2d(x, filters = 32, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")## ------------
# Decoder path 1
x <- layer_conv_2d_transpose(x, filters = 16, kernel_size = c(2,2),
padding = "same", strides = c(2,2))
x <- layer_concatenate(list(down_layers[[1]], x))
x <- layer_conv_2d(x, filters = 16, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")
x <- layer_dropout(x, rate = 0.1)
x <- layer_conv_2d(x, filters = 16, kernel_size = c(3,3),
activation = "relu", kernel_initializer = "he_normal", padding = "same")flatten layer we have previously used, but here forces the output into a shape that is compatible with the masks. The masks have 6 channels (one for each class)## ------------
# Output layer
output <- layer_conv_2d(x, filters = num_classes,
kernel_size = c(1,1), activation = "softmax")With all that done, we can now make the model by linking the input layers and the output:
model <- keras_model(input, output)Let’s take a look at the model summary:
summary(model)Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ input_layer │ (None, 128, 128, │ 0 │ - │
│ (InputLayer) │ 3) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ rescaling (Rescaling) │ (None, 128, 128, │ 0 │ input_layer[0][0] │
│ │ 3) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d (Conv2D) │ (None, 128, 128, │ 448 │ rescaling[0][0] │
│ │ 16) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ dropout (Dropout) │ (None, 128, 128, │ 0 │ conv2d[0][0] │
│ │ 16) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_1 (Conv2D) │ (None, 128, 128, │ 2,320 │ dropout[0][0] │
│ │ 16) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ max_pooling2d │ (None, 64, 64, │ 0 │ conv2d_1[0][0] │
│ (MaxPooling2D) │ 16) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_2 (Conv2D) │ (None, 64, 64, │ 4,640 │ max_pooling2d[0][… │
│ │ 32) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ dropout_1 (Dropout) │ (None, 64, 64, │ 0 │ conv2d_2[0][0] │
│ │ 32) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_3 (Conv2D) │ (None, 64, 64, │ 9,248 │ dropout_1[0][0] │
│ │ 32) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ max_pooling2d_1 │ (None, 32, 32, │ 0 │ conv2d_3[0][0] │
│ (MaxPooling2D) │ 32) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_4 (Conv2D) │ (None, 32, 32, │ 18,496 │ max_pooling2d_1[0… │
│ │ 64) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ dropout_2 (Dropout) │ (None, 32, 32, │ 0 │ conv2d_4[0][0] │
│ │ 64) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_5 (Conv2D) │ (None, 32, 32, │ 36,928 │ dropout_2[0][0] │
│ │ 64) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ max_pooling2d_2 │ (None, 16, 16, │ 0 │ conv2d_5[0][0] │
│ (MaxPooling2D) │ 64) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_6 (Conv2D) │ (None, 16, 16, │ 73,856 │ max_pooling2d_2[0… │
│ │ 128) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ dropout_3 (Dropout) │ (None, 16, 16, │ 0 │ conv2d_6[0][0] │
│ │ 128) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_7 (Conv2D) │ (None, 16, 16, │ 147,584 │ dropout_3[0][0] │
│ │ 128) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ max_pooling2d_3 │ (None, 8, 8, 128) │ 0 │ conv2d_7[0][0] │
│ (MaxPooling2D) │ │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ dropout_4 (Dropout) │ (None, 8, 8, 128) │ 0 │ max_pooling2d_3[0… │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_8 (Conv2D) │ (None, 8, 8, 256) │ 295,168 │ dropout_4[0][0] │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_9 (Conv2D) │ (None, 8, 8, 256) │ 590,080 │ conv2d_8[0][0] │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_transpose │ (None, 16, 16, │ 131,200 │ conv2d_9[0][0] │
│ (Conv2DTranspose) │ 128) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ concatenate │ (None, 16, 16, │ 0 │ conv2d_7[0][0], │
│ (Concatenate) │ 256) │ │ conv2d_transpose[… │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_10 (Conv2D) │ (None, 16, 16, │ 295,040 │ concatenate[0][0] │
│ │ 128) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ dropout_5 (Dropout) │ (None, 16, 16, │ 0 │ conv2d_10[0][0] │
│ │ 128) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_11 (Conv2D) │ (None, 16, 16, │ 147,584 │ dropout_5[0][0] │
│ │ 128) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_transpose_1 │ (None, 32, 32, │ 32,832 │ conv2d_11[0][0] │
│ (Conv2DTranspose) │ 64) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ concatenate_1 │ (None, 32, 32, │ 0 │ conv2d_5[0][0], │
│ (Concatenate) │ 128) │ │ conv2d_transpose_… │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_12 (Conv2D) │ (None, 32, 32, │ 73,792 │ concatenate_1[0][… │
│ │ 64) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ dropout_6 (Dropout) │ (None, 32, 32, │ 0 │ conv2d_12[0][0] │
│ │ 64) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_13 (Conv2D) │ (None, 32, 32, │ 36,928 │ dropout_6[0][0] │
│ │ 64) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_transpose_2 │ (None, 64, 64, │ 8,224 │ conv2d_13[0][0] │
│ (Conv2DTranspose) │ 32) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ concatenate_2 │ (None, 64, 64, │ 0 │ conv2d_3[0][0], │
│ (Concatenate) │ 64) │ │ conv2d_transpose_… │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_14 (Conv2D) │ (None, 64, 64, │ 18,464 │ concatenate_2[0][… │
│ │ 32) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ dropout_7 (Dropout) │ (None, 64, 64, │ 0 │ conv2d_14[0][0] │
│ │ 32) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_15 (Conv2D) │ (None, 64, 64, │ 9,248 │ dropout_7[0][0] │
│ │ 32) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_transpose_3 │ (None, 128, 128, │ 2,064 │ conv2d_15[0][0] │
│ (Conv2DTranspose) │ 16) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ concatenate_3 │ (None, 128, 128, │ 0 │ conv2d_1[0][0], │
│ (Concatenate) │ 32) │ │ conv2d_transpose_… │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_16 (Conv2D) │ (None, 128, 128, │ 4,624 │ concatenate_3[0][… │
│ │ 16) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ dropout_8 (Dropout) │ (None, 128, 128, │ 0 │ conv2d_16[0][0] │
│ │ 16) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_17 (Conv2D) │ (None, 128, 128, │ 2,320 │ dropout_8[0][0] │
│ │ 16) │ │ │
├───────────────────────┼───────────────────┼─────────────┼────────────────────┤
│ conv2d_18 (Conv2D) │ (None, 128, 128, │ 102 │ conv2d_17[0][0] │
│ │ 6) │ │ │
└───────────────────────┴───────────────────┴─────────────┴────────────────────┘
Total params: 1,941,190 (7.41 MB)
Trainable params: 1,941,190 (7.41 MB)
Non-trainable params: 0 (0.00 B)
This model has 1.94 million weights or parameters to train. This is fairly common with any large CNN-type model, and is why we generally need a large amount of data to train.
We can also visualize the architecture. You should be able to see a ‘C’ like structure between the downward and upward paths of the model. In the original paper, this was shown rotated 90 degrees to the left, hence the name ’U’Net. (Note that you might need to save this and zoom in to see the detail.)
plot(model)We’ll use the accuracy to assess the model (alternatively, we could use the intersection over union).
metrics = "accuracy"We’ll set the optimizer to RMSprop with a learning rate of 1e-3:
optim = optimizer_rmsprop(learning_rate = 1e-3)Let’s compile the model and create a callback to save the best performing set of weights during training
model |>
compile(optimizer = optim,
loss = "sparse_categorical_crossentropy",
metrics = metrics)
callbacks <- list(
callback_model_checkpoint("lulc_segmentation.keras",
save_best_only = TRUE)) With all that in place, we can train the model. We’ll use batchs of 64 images, and run for 25 epochs.
history <- model |> fit(
train_dataset,
epochs = 25,
callbacks = callbacks,
validation_data = validation_dataset
)Epoch 1/25
48/48 - 27s - 558ms/step - accuracy: 0.4014 - loss: 1.3712 - val_accuracy: 0.5243 - val_loss: 1.1535
Epoch 2/25
48/48 - 25s - 525ms/step - accuracy: 0.5256 - loss: 1.1602 - val_accuracy: 0.6087 - val_loss: 1.1859
Epoch 3/25
48/48 - 26s - 541ms/step - accuracy: 0.6187 - loss: 1.0193 - val_accuracy: 0.6388 - val_loss: 0.9687
Epoch 4/25
48/48 - 28s - 577ms/step - accuracy: 0.6486 - loss: 0.9410 - val_accuracy: 0.6860 - val_loss: 0.8606
Epoch 5/25
48/48 - 27s - 571ms/step - accuracy: 0.6658 - loss: 0.9032 - val_accuracy: 0.6963 - val_loss: 0.8223
Epoch 6/25
48/48 - 26s - 549ms/step - accuracy: 0.6880 - loss: 0.8588 - val_accuracy: 0.7074 - val_loss: 0.8047
Epoch 7/25
48/48 - 26s - 546ms/step - accuracy: 0.7137 - loss: 0.8118 - val_accuracy: 0.6007 - val_loss: 1.1324
Epoch 8/25
48/48 - 26s - 551ms/step - accuracy: 0.7244 - loss: 0.7872 - val_accuracy: 0.6884 - val_loss: 0.8895
Epoch 9/25
48/48 - 26s - 545ms/step - accuracy: 0.7360 - loss: 0.7586 - val_accuracy: 0.6670 - val_loss: 0.9045
Epoch 10/25
48/48 - 26s - 537ms/step - accuracy: 0.7503 - loss: 0.7196 - val_accuracy: 0.6533 - val_loss: 0.9758
Epoch 11/25
48/48 - 26s - 546ms/step - accuracy: 0.7603 - loss: 0.6939 - val_accuracy: 0.7539 - val_loss: 0.7051
Epoch 12/25
48/48 - 26s - 551ms/step - accuracy: 0.7661 - loss: 0.6694 - val_accuracy: 0.6518 - val_loss: 1.0766
Epoch 13/25
48/48 - 26s - 543ms/step - accuracy: 0.7680 - loss: 0.6641 - val_accuracy: 0.7783 - val_loss: 0.6228
Epoch 14/25
48/48 - 27s - 559ms/step - accuracy: 0.7753 - loss: 0.6393 - val_accuracy: 0.7483 - val_loss: 0.7113
Epoch 15/25
48/48 - 27s - 552ms/step - accuracy: 0.7780 - loss: 0.6283 - val_accuracy: 0.7812 - val_loss: 0.6273
Epoch 16/25
48/48 - 27s - 556ms/step - accuracy: 0.7819 - loss: 0.6222 - val_accuracy: 0.7850 - val_loss: 0.6125
Epoch 17/25
48/48 - 27s - 566ms/step - accuracy: 0.7847 - loss: 0.6080 - val_accuracy: 0.7556 - val_loss: 0.6768
Epoch 18/25
48/48 - 27s - 558ms/step - accuracy: 0.7869 - loss: 0.6016 - val_accuracy: 0.7657 - val_loss: 0.6894
Epoch 19/25
48/48 - 27s - 553ms/step - accuracy: 0.7885 - loss: 0.5976 - val_accuracy: 0.7865 - val_loss: 0.6320
Epoch 20/25
48/48 - 26s - 550ms/step - accuracy: 0.7898 - loss: 0.5871 - val_accuracy: 0.7630 - val_loss: 0.6488
Epoch 21/25
48/48 - 26s - 545ms/step - accuracy: 0.7955 - loss: 0.5752 - val_accuracy: 0.7914 - val_loss: 0.5822
Epoch 22/25
48/48 - 26s - 544ms/step - accuracy: 0.7973 - loss: 0.5687 - val_accuracy: 0.7904 - val_loss: 0.5740
Epoch 23/25
48/48 - 26s - 541ms/step - accuracy: 0.8012 - loss: 0.5580 - val_accuracy: 0.8032 - val_loss: 0.5649
Epoch 24/25
48/48 - 26s - 540ms/step - accuracy: 0.8009 - loss: 0.5583 - val_accuracy: 0.8050 - val_loss: 0.5456
Epoch 25/25
48/48 - 26s - 551ms/step - accuracy: 0.8060 - loss: 0.5425 - val_accuracy: 0.8052 - val_loss: 0.5543
And let’s plot the history
plot(history)The loss curve is noisy but shows a fairly consistent decline. As it has not yet plateaued, it may be worth increasing the number of epochs to train for longer.
To finish up, we’ll take a look at how well the model can segment an image. As we don’t have a separate testing set, we’ll simply use one of the images from the validation set. The steps here are to
model <- load_model("lulc_segmentation.keras")i = 1
test_image <- val_paths$input[i] |>
tf_read_image_and_resize("jpeg", channels = 3L)
test_mask <- val_paths$target[i] |>
tf_read_image_and_resize("png", channels = 1L)predict function to estimate the probability of each class for each pixelpredicted_mask_probs <-
model(test_image[tf$newaxis, , , ])predicted_mask <-
tf$argmax(predicted_mask_probs, axis = -1L)par(mfrow = c(1, 3))
display_image_tensor(test_image)
display_target_tensor(test_mask)
display_target_tensor(predicted_mask)The resulting segmentation is far from perfect here, but given the size of the input data and the relatively short training period, it is already starting to capture the spatial patterns in this image. The next steps are likely to be: